Command Line Interface Utilities

Introduction

主要用来管理作业状态

Oozie Command Line Usage

环境变量:

OOZIE_URL: 默认'-oozie' 选项值

OOZIE_TIMEZONE: 默认'-timezone' 选项值'

OOZIE_AUTH: 默认'-auth' 选项值'

'自定义头信息:使用'-Dheader:NAME=VALUE'

oozie version : 显示客户端版本

作业操作:

 oozie job <OPTIONS> : 
    -action <arg>         coordinator rerun/kill on action ids (requires -rerun/-kill); coordinator log retrieval on action ids (requires -log)
    -allruns              Get workflow jobs corresponding to a coordinator action
                          including all the reruns
    -auth <arg>           select authentication type [SIMPLE|KERBEROS]
    -change <arg>         change a coordinator/bundle job
    -config <arg>         job configuration file '.xml' or '.properties'
    -D <property=value>   set/override value for given property
    -date <arg>           coordinator/bundle rerun on action dates (requires -rerun)
    -definition <arg>     job definition
    -doas <arg>           doAs user, impersonates as the specified user
    -dryrun               Dryrun a workflow (since 3.3.2) or coordinator (since 2.0) job without actually executing it
    -info <arg>           info of a job
    -kill <arg>           kill a job (coordinator requires -action or -date)
    -len <arg>            number of actions to be returned, used for pagination(default 1000, requires -info)
    -filter <arg>         All coordinator actions satisfying the filter will be retrieved.
                          Filter is of the format <key><comparator><value>[;<key><comparator><value>]*
                          key: status or nominaltime
                          comparator: =, !=, <, <=, >, >=
                          value: valid status like SUCCEEDED, KILLED, RUNNING etc. Only = and != apply for status
                          nominalTime is valid date of the format yyyy-MM-dd'T'HH:mm'Z' (like 2014-06-01T00:00Z)
                          Filter with '=' is concatenated with 'OR' and other filters are concatenated with 'AND'.
                          Currently, supported only for coordinator job.
    -order <arg>          order to show coordinator actions (default ascending order, 'desc' for descending order, requires -info)
                          Currently, only supported for coordinator job.
    -localtime            use local time (same as passing your time zone to -timezone).
                          Overrides -timezone option
    -log <arg>            job log
    -nocleanup            do not clean up output-events of the coordinator rerun actions
                          (requires -rerun)
    -offset <arg>         offset of actions returned relative to all actions matching the filter criteria,
                          used for pagination (default '1', requires -info)
    -oozie <arg>          Oozie URL
    -refresh              re-materialize the coordinator rerun actions (requires -rerun)
    -rerun <arg>          rerun a job  (coordinator requires -action or -date; bundle requires -coordinator or -date)
    -resume <arg>         resume a job
    -run                  run a job
    -start <arg>          start a job
    -submit               submit a job
    -suspend <arg>        suspend a job
    -timezone <arg>       use time zone with the specified ID (default GMT).
                          See 'oozie info -timezones' for a list
    -value <arg>          new endtime/concurrency/pausetime value for changing a
                          coordinator job; new pausetime value for changing a bundle job
    -verbose              verbose mode
    -update               Update coordinator definition and properties
    -logfilter            job log search parameter. Can be specified as -logfilter opt1=val1;opt2=val1;opt3=val1.
                          Supported options are recent, start, end, loglevel, text, limit and debug.
    -ignore <arg>         ignore a coordinator job or action
                          (requires '-action' to

作业状态:

  oozie jobs <OPTIONS> : 
        -auth <arg>         选择验证类型 [SIMPLE|KERBEROS]
        -doas <arg>         指定用户操作
        -oozie <arg>         Oozie URL
        -filter <arg>        user=<U>\;name=<N>\;group=<G>\;status=<S>\;...
        -jobtype <arg>       作业类型(仅支持Oozie-2.0或之后版本ONLY - coordinator' 或 'wf' (default))
        -len <arg>           作业数 (default '100')
        -localtime           本地时间,将覆盖-timezone 选项
        -offset <arg>        jobs offset (default '1')
        -timezone <arg>      指定时区
        -verbose             verbose mode

检查多个工作流作业状态:
    $ oozie jobs -oozie http://localhost:11000/oozie -localtime -len 2 -filter status=RUNNING
    Job Id                          Workflow Name         Status     Run  User      Group     Created                Started                 Ended
    .----------------------------------------------------------------------------------------------------------------------------------------------------------------
    4-20090527151008-oozie-joe     hadoopel-wf           RUNNING    0    joe      other     2009-05-27 15:34 +0530 2009-05-27 15:34 +0530  -
    0-20090527151008-oozie-joe     hadoopel-wf           RUNNING    0    joe      other     2009-05-27 15:15 +0530 2009-05-27 15:15 +0530  -

    可用filter有:
        name: the workflow application name from the workflow definition.
        user: the user that submitted the job.
        group: the group for the job.
        status: the status of the job.
        frequency: the frequency of the Coordinator job.
        unit: the time unit. It can take one of the following four values: months, days, hours or minutes. Time unit should be added only when frequency is specified.

检查多个Coordinator 作业状态:
    $ oozie jobs -oozie http://localhost:11000/oozie -jobtype coordinator
检查多个bundle作业状态:
    $ oozie jobs -oozie http://localhost:11000/oozie -jobtype bundle
bulk监控用于Bundle运行的作业
    $ oozie jobs -oozie http://localhost:11000/oozie -bulk bundle=bundle-app-1
    $ oozie jobs -oozie http://localhost:11000/oozie -bulk 'bundle=bundle-app-2;actionstatus=SUCCEEDED' -verbose
        bundle: Bundle app-name (mandatory)
        coordinators: multiple, comma-separated Coordinator app-names. By default, if none specified, all coordinators pertaining to the bundle are included
        actionstatus: status of jobs (default job-type is coordinator action aka workflow job) you wish to filter on. Default value is (KILLED,FAILED)
        startcreatedtime/endcreatedtime: specify boundaries for the created-time. Only jobs created within this window are included.
        startscheduledtime/endscheduledtime: specify boundaries for the nominal-time. Only jobs scheduled to run within this window are included.
    $ oozie jobs <OPTIONS> : jobs status
         -bulk <arg>       key-value pairs to filter bulk jobs response. e.g.
                           bundle=<B>\;coordinators=<C>\;actionstatus=<S>\;
                           startcreatedtime=<SC>\;endcreatedtime=<EC>\;
                           startscheduledtime=<SS>\;endscheduledtime=<ES>\;
                           coordinators and actionstatus can be multiple comma
                           separated values. "bundle" and "coordinators" are 'names' of those jobs.
                           Bundle name is mandatory, other params are optional

管理员操作:

oozie admin <OPTIONS> : 
    -auth <arg>         选择验证类型 [SIMPLE|KERBEROS]
    -doas <arg>         指定用户操作
    -oozie <arg>        Oozie URL       
    -queuedump          显示Oozie服务器队列元素
    -servers            可用Oozie服务器列表(启用HA时,超过一个)
    -status             当前系统状态
    -systemmode <arg>   支持Oozie-2.0或之后版本.改变系统模式[NORMAL|                                 NOWEBSERVICE|SAFEMODE]
    -version            显示Oozie服务器编译版本
    -shareliblist       列出指定工作流活动的可用共享库
    -sharelibupdate     更新服务器使用新的共享库版本

检查Oozie系统状态     
    $ oozie admin -oozie http://localhost:11000/oozie -status
        Safemode: OFF
改变Oozie系统状态
    $ oozie admin -oozie http://localhost:11000/oozie -systemmode [NORMAL|NOWEBSERVICE|SAFEMODE]
        Safemode: ON
显示Oozie系统编译版本
    $ oozie admin -oozie http://localhost:11000/oozie -version
        Oozie server build version: 2.0.2.1-0.20.1.3092118008--
显示Oozie系统队列:用于管理员调试目的
    $ oozie admin -oozie http://localhost:11000/oozie -queuedump[Server Queue Dump]:(coord_action_start,1),(coord_action_start,1),(coord_action_start,1)(coord_action_ready,1)(action.check,2)
显示Oozie服务器列表:(HA)
    $ oozie admin -oozie http://hostA:11000/oozie -servers
        hostA : http://hostA:11000/oozie
        hostB : http://hostB:11000/oozie
        hostC : http://hostC:11000/oozie

SLA Operations 获取SLA 事件列表 $ oozie sla -oozie http://localhost:11000/oozie -len 3 获取特定序列SLA事件 $ oozie sla -oozie http://localhost:11000/oozie -offset 2 -len 1 使用过滤获取SLA事件信息 $ oozie sla -filter jobid=0000000-130130150445097-oozie-joe-C@1\;appname=aggregator-sla-app -len 1 -oozie http://localhost:11000/oozie 可用filter有:jobid,appname

验证操作:

验证WF xml文件:
$ oozie validate myApp/workflow.xml

可用共享库列表
$ oozie admin -oozie http://localhost:11000/oozie -shareliblist
$ oozie admin -oozie http://localhost:11000/oozie -sharelib pig*

更新共享库
$ oozie admin -oozie http://localhost:11000/oozie -sharelibupdate

指定主题详细信息:

  oozie info <OPTIONS> : 
            -timezones   display a list of available time zones
获取时区列表:
    $ oozie info -timezones

Pig作业:

  oozie pig <OPTIONS> -X <ARGS> : 
        '-X'后的参数将被提交给pig, '-X'后'-D'  参数将放入<configuration>中
            -auth <arg>           选择验证类型 [SIMPLE|KERBEROS]
            -config <arg>         作业配置文件,文件名后缀'.properties'
            -D <property=value>   设置或覆盖给定属性的值
            -doas <arg>           指定用户操作
            -oozie <arg>          Oozie URL

            -file <arg>           Pig script
            -P <property=value>   为脚本设置参数

例子:

$ oozie pig -file pigScriptFile -config job.properties -Dfs.default.name=hdfs://localhost:8020 -PINPUT=/user/me/in -POUTPUT=/user/me/out -X -Dmapred.job.queue.name=UserQueue -param_file params
.
job: 14-20090525161321-oozie-joe-W
.
$cat job.properties
fs.default.name=hdfs://localhost:8020
mapreduce.jobtracker.kerberos.principal=ccc
dfs.namenode.kerberos.principal=ddd
oozie.libpath=hdfs://localhost:8020/user/oozie/pig/lib/

Hive Operations

  oozie hive <OPTIONS> -X<ARGS> : 
        '-X'后的参数将被提交给 hive, '-X'后'-D'  参数将放入<configuration>中
             -auth <arg>           选择验证类型 [SIMPLE|KERBEROS]
             -config <arg>         作业配置文件,文件名后缀'.properties'
             -D <property=value>   设置或覆盖给定属性的值
             -doas <arg>           指定用户操作
             -oozie <arg>          Oozie URL
             -file <arg>           hive script
             -P <property=value>   set parameters for script

$ oozie hive -file HIVE-SCRIPT -config OOZIE-CONFIG [-Dkey=value] [-Pkey=value] [-X [-Dkey=value opts for Launcher/Job configuration] [Other opts to pass to Hive]]

例子:

$ oozie hive -file hiveScriptFile -config job.properties -Dfs.default.name=hdfs://localhost:8020 -PINPUT=/user/me/in -POUTPUT=/user/me/out -X -Dmapred.job.queue.name=UserQueue -v
    job: 14-20090525161321-oozie-joe-W
$cat job.properties
fs.default.name=hdfs://localhost:8020
mapreduce.jobtracker.kerberos.principal=ccc
dfs.namenode.kerberos.principal=ddd
oozie.libpath=hdfs://localhost:8020/user/oozie/hive/lib/

Sqoop操作

  oozie sqoop <OPTIONS> -X<ARGS> : 
        '-X'后'-D'  参数将放入<configuration>中
             -auth <arg>           选择验证类型 [SIMPLE|KERBEROS]
             -config <arg>         作业配置文件,文件名后缀'.properties'
             -D <property=value>   设置或覆盖给定属性的值
             -doas <arg>           指定用户操作
             -oozie <arg>          Oozie URL
             -command <arg>        sqoop command

例子:

$ oozie sqoop -oozie http://localhost:11000/oozie -Dfs.default.name=hdfs://localhost:8020 -command import --connect jdbc:mysql://localhost:3306/oozie --username oozie --password oozie --table WF_JOBS --target-dir '/user/${wf:user()}/${examplesRoot}/output-data/sqoop' -m 1 -config job.properties -X -Dmapred.job.queue.name=default
.
job: 14-20090525161322-oozie-joe-W
.

自由格式例子:

$ oozie sqoop -oozie http://localhost:11000/oozie -command import --connect jdbc:mysql://localhost:3306/oozie --username oozie --password oozie --query "SELECT a.id FROM WF_JOBS a WHERE \$CONDITIONS" --target-dir '/user/${wf:user()}/${examplesRoot}/output-data/sqoop' -m 1 -config job.properties -X -Dmapred.job.queue.name=default
.
job: 14-20090525161321-oozie-joe-W
.
$cat job.properties
fs.default.name=hdfs://localhost:8020
mapreduce.jobtracker.kerberos.principal=ccc
dfs.namenode.kerberos.principal=ddd
oozie.libpath=hdfs://localhost:8020/user/oozie/sqoop/lib/

MR操作:

  oozie mapreduce <OPTIONS> : 
                  -auth <arg>           选择验证类型 [SIMPLE|KERBEROS]
                  -config <arg>         作业配置文件,文件名后缀'.properties'
                  -D <property=value>   设置或覆盖给定属性的值
                  -doas <arg>           指定用户操作
                  -oozie <arg>          Oozie URL

$ oozie mapreduce -oozie http://localhost:11000/oozie -config job.properties

配置文件中必须指定:

mapred.mapper.class ,mapred.reducer.class , mapred.input.dir ,mapred.output.dir , oozie.libpath, mapred.job.tracker , fs.default.name

Common CLI Options

Authentication

Impersonation, doAs

Oozie URL

-oozie OOZIE_URL指定运行Oozie的系统,OOZIE_URL可以在环境变量中设置。如果都不提供,CLI fail

Time zone

-timezone TIME_ZONE_ID 指定时区;oozie info -timezones获取可用时区列表; -localtime-timezone TIME_ZONE_ID 都指定,则-localtime 将覆盖后者; 均未给定,这将查找 OOZIE_TIMEZONE环境变量; 上述均为给定,或非法时区,将使用GMT

Debug Mode

OOZIE_DEBUG=1 在CLI将输出详细任意执行的详细信息

CLI retry

Job Operations

提交作业

$ oozie job -oozie http://localhost:11000/oozie -config job.properties -submit
job: 14-20090525161321-oozie-joe

-config 指定xml配置文件;并且wf必须指定oozie.wf.application.path属性;coordinator 指定oozie.coord.application.path属性;bundle 指定oozie.bundle.application.path;并且指定路径必须是一个HDFS路径。当前作业处于 PREP状态

启动作业

$ oozie job -oozie http://localhost:11000/oozie -start 14-20090525161321-oozie-joe

作业将处于 PREP状态

运行作业

$ oozie job -oozie http://localhost:11000/oozie -config job.properties -run

job: 15-20090525161321-oozie-joe

run选项用来创建和启动作业,作业参数必须在属性文件中提供,并指定 -config选项; -config 指定xml配置文件;并且wf必须指定oozie.wf.application.path属性;coordinator 指定oozie.coord.application.path属性;bundle 指定oozie.bundle.application.path;并且指定路径必须是一个HDFS路径。作业将处于 RUNNING状态;

暂停作业

$ oozie job -oozie http://localhost:11000/oozie -suspend 14-20090525161321-oozie-joe

作业将处于 SUSPENDED状态;暂停 coordinator/bundle作业的 RUNNING , RUNNIINGWITHERROR or PREP状态. The suspend option suspends a coordinator/bundle job in RUNNING , RUNNIINGWITHERROR or PREP status. When the coordinator job is suspended, running coordinator actions will stay in running and the workflows will be suspended. If the coordinator job is in RUNNING status, it will transit to SUSPENDED status; if it is in RUNNINGWITHERROR status, it will transit to SUSPENDEDWITHERROR ; if it is in PREP status, it will transit to PREPSUSPENDED status.

When the bundle job is suspended, running coordinators will be suspended. If the bundle job is in RUNNING status, it will transit to SUSPENDED status; if it is in RUNNINGWITHERROR status, it will transit to SUSPENDEDWITHERROR ; if it is in PREP status, it will transit to PREPSUSPENDED status.

恢复作业

$ oozie job -oozie http://localhost:11000/oozie -resume 14-20090525161321-oozie-joe

The suspend option suspends a coordinator/bundle job in SUSPENDED , SUSPENDEDWITHERROR or PREPSUSPENDED status. If the coordinator job is in SUSPENDED status, it will transit to RUNNING status; if it is in SUSPENDEDWITHERROR status, it will transit to RUNNINGWITHERROR ; if it is in PREPSUSPENDED status, it will transit to PREP status.

When the coordinator job is resumed it will create all the coordinator actions that should have been created during the time it was suspended, actions will not be lost, they will delayed.

When the bundle job is resumed, suspended coordinators will resume running. If the bundle job is in SUSPENDED status, it will transit to RUNNING status; if it is in SUSPENDEDWITHERROR status, it will transit to RUNNINGWITHERROR ; if it is in PREPSUSPENDED status, it will transit to PREP status.

Kill作业:KILLED

$ oozie job -oozie http://localhost:11000/oozie -kill 14-20090525161321-oozie-joe

Kill Coordinator 活动或多个活动

$oozie job -kill <coord_Job_id> [-action 1, 3-4, 7-40] [-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z]

The kill option here for a range of coordinator actions kills a non-terminal (=RUNNING=, WAITING , READY , SUSPENDED ) coordinator action when coordinator job is not in FAILED or KILLED state.
Either -action or -date should be given.
If neither -action nor -date is given, the exception will be thrown. Also if BOTH -action and -date are given, an error will be thrown.
Multiple ranges can be used in -action or -date. See the above example.
If one of the actions in the given list of -action is already in terminal state, the output of this command will only include the other actions.
The dates specified in -date must be UTC.
Single date specified in -date must be able to find an action with matched nominal time to be effective.
After the command is executed the killed coordiator action will have KILLED status.

改变Coordinator作业 endtime/concurrency/pausetime/status ####

Changing endtime/pausetime of a Bundle Job
Changing endtime/pausetime of a Bundle Job
Rerunning a Workflow Job
Rerunning a Coordinator Action or Multiple Actions
Rerunning a Bundle Job
Checking the Information and Status of a Workflow, Coordinator or Bundle Job or a Coordinator Action Listing all the Workflows for a Coordinator Action
Checking the xml definition of a Workflow, Coordinator or Bundle Job
Checking the server logs of a Workflow, Coordinator or Bundle Job
Checking the server logs for particular actions of a Coordinator Job
Filtering the server logs with logfilter options
Dryrun of Coordinator Job
Dryrun of Workflow Job
Updating coordinator definition and properties
Ignore a Coordinator Job
Ignore a Coordinator Action or Multiple Coordinator Actions

    Jobs Operations
        Checking the Status of multiple Workflow Jobs
        Checking the Status of multiple Coordinator Jobs
        Checking the Status of multiple Bundle Jobs
        Bulk monitoring for jobs launched via Bundles
    Admin Operations
        Checking the Status of the Oozie System
        Changing the Status of the Oozie System
        Displaying the Build Version of the Oozie System
        Displaying the queue dump of the Oozie System
        Displaying the list of available Oozie Servers
    Validate Operations
        Validating a Workflow XML
        Getting list of available sharelib
        Update system sharelib

SLA Operations Getting a list of SLA events Getting the SLA event with particular sequenceID Getting information about SLA events using filter